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The method of regularization with the Gaussian reproducing ker- 
nel is popular in the machine learning literature and successful in 
many practical applications. In this paper we consider the periodic 
version of the Gaussian kernel regularization. We show in the white 
noise model setting, that in function spaces of very smooth func- 
tions, such as the infinite-order Sobolev space and the space of an- 
alytic functions, the method under consideration is asymptotically 
minimax; in finite-order Sobolev spaces, the method is rate optimal, 
and the efficiency in terms of constant when compared with the min- 
imax estimator is reasonably high. The smoothing parameters in the 
periodic Gaussian regularization can be chosen adaptively without 
loss of asymptotic efficiency. The results derived in this paper give a 
partial explanation of the success of the Gaussian reproducing kernel 
in practice. Simulations are carried out to study the finite sample 
properties of the periodic Gaussian regularization. 

1. Introduction. The method of regularization is a popular approach for 
nonparametric function estimation. Let / be the nonparametric function to 
be estimated. The method of regularization takes the form 

(1) min[L(/,data) + AJ(/)], 

where L is the empirical loss, often taken to be the negative log-likelihood, 
and J(f) is the penalty functional, usually a quadratic functional corre- 
sponding to a norm or semi-norm of a reproducing kernel Hilbert space 
J-. Most often the penalty functional is chosen so that smoother functions 
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incur smaller penalty. The smoothing parameter A controls the tradeoff be- 
tween minimizing the empirical loss and obtaining a smooth solution. For a 
concrete example, let us look at the regression model 

(2) yj = f{xj) + Sj, j = l,...,n, 

where Xj G R, j = 1, . . . ,n, are the regression inputs, y^s are the responses, 
and Sj 's are independent N(0, 1) noises. In this case we may take L(f, data) = 
Y^j=\{Vj ~ f( x j)) 2 i n the method of regularization (1). 

The reproducing kernel Hilbert space T is typically of infinite dimension. 
In many situations, including regression and generalized regression, when the 
penalty functional J(f) is a norm over J-, the representer theorem [Kimel- 
dorf and Wahba (1971)] guarantees that the solution to (1) over T falls 
in the finite-dimensional space spanned by {K(xj,-),j = l,...,n}, where 
K(-, •) is the reproducing kernel corresponding to </(/). See also Schdlkopf, 
Herbrich and Smola (2001) for some generalizations of the representer the- 
orem. Therefore, we can write the solution as / = X)?=i CjK(xi,x). The 
minimization problem can then be solved in this finite-dimensional space. 

The smoothing spline well known in the nonparametric statistics litera- 
ture is an example of the method of regularization. In the smoothing spline 
the reproducing kernel Hilbert space T is a Hilbert Sobolev space and the 
penalty functional </(/) is the norm or semi-norm of the space, such as 
J[f^ m \ x )] 2 dx. The commonly used cubic smoothing spline corresponds to 
the case m = 2. The reproducing kernel of the Hilbert Sobolev space was 
given in Wahba (1990). 

The method of regularization has also been popular in the machine learn- 
ing literature. Examples include regularization networks and more recently, 
support vector machines. See, for example, Girosi, Jones and Poggio (1993), 
Smola, Schdlkopf and Miiller (1998), Wahba (1999) and Evgeniou, Pontil and 
Poggio (2000). One reproducing kernel that is particularly popular in the 
machine learning literature is the Gaussian reproducing kernel (commonly 
referred to as the Gaussian kernel in the machine learning literature, not to 
be confused with the Gaussian kernel used in kernel smoothing in the non- 
parametric statistics literature). Let G(r) = {2'k)^ 1 / 2 uj~ 1 exp(— r 2 /{2uj 2 )) be 
the density function of N(0,uj 2 ). The Gaussian reproducing kernel has the 
form G(s, t) = G(s — t). This is a common example of the translation invari- 
ant reproducing kernels popular in machine learning. It is known [Girosi, 
Jones and Poggio (1993) and Smola, Schdlkopf and Miiller (1998)] that the 
Gaussian reproducing kernel corresponds to the penalty functional (up to a 
constant) 

m=Q z m.j-oo 
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Smola, Scholkopf and Miiller (1998) introduced the periodic Gaussian 
reproducing kernel for estimating 27r-periodic functions in [— ir,ir] as the 
reproducing kernel corresponding to the penalty functional 



From (3) and (4) it is clear that the two reproducing kernels are closely 
related. The connection between the two reproducing kernels will be clearer 
when we consider the computation with the periodic Gaussian reproducing 
kernel in Section 5. 

Many researchers in machine learning have derived upper bounds of the 
generalization performance of the method of regularization with the Gaus- 
sian or periodic Gaussian reproducing kernels. See Williamson, Smola and 
Scholkopf (2001) and the references therein. However, while popular in the 
machine learning literature, and successful in many practical applications, 
the statistical asymptotic properties of the method of regularization with 
the Gaussian or periodic Gaussian reproducing kernels have not been stud- 
ied systematically. In this paper we study the asymptotic properties of the 
method of regularization with the periodic Gaussian reproducing kernel in 
nonparametric function estimation problems and derive the asymptotic risk 
(up to constants) of the method of regularization with the periodic Gaussian 
reproducing kernel. We choose to work with the periodic Gaussian repro- 
ducing kernel because it allows a detailed asymptotic analysis. We believe 
the results obtained in this paper should also give insights on the statistical 
properties of the Gaussian reproducing kernel. 

Motivated by the equivalence results of Brown and Low (1996) for Gaus- 
sian nonparametric regression and Nussbaum (1996) for density estimation 
[see also Golubev and Nussbaum (1998) for spectral density estimation; 
Grama and Nussbaum (1997) for nonparametric generalized linear regres- 
sion], we first look at the white noise problem 



where B{t) is a standard Brownian motion on [— 7r,7r] and we observe Y n = 
(Y n (t), —tt < t < 7r). We consider the situation where the function / belongs 
to a certain function ellipsoid of the form 










for some positive sequence {pi, I = 0, 1, . . . }. Here {(fto(t) = (2vr) 1//2 , 4>2i-i(t) '- 
tt^ 1 / 2 sin(/i), (f>2i(t) = 7T" 1 / 2 cos(/i)} is the classical trigonometric basis in 
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L2(—K,Tr) and = (f,4>i) is the corresponding Fourier coefficient, where 
(/, 4>) = f(t)(f>(t) dt denotes the usual inner product in L2(—tt,7t). 

The commonly considered Sobolev ellipsoid H m (Q) corresponds to the 
sequence po = 1, P21-1 = P21 = l 2m + 1 in (6). This is the mth order Sobolev 
space of periodic functions on [— 7r,7r]. An alternative definition of H m (Q) 
is 

H m (Q) = {/ € L 2 (-tt, tt) : / is 2^-periodic, 

(7) 

"[f(t)] 2 + [fM(t)) 2 dt<Q\. 



Therefore, the mth order Sobolev space consists of functions that possess 
mth order smoothness. The order of smoothness is determined by the rate 
at which the sequence of p's increases. In the Sobolev space case the rate is 
of polynomial order. 

Another function space that has been considered in the literature is the 
space of analytic functions. An ellipsoid of analytic functions A a {Q) corre- 
sponds to (6) with the exponentially increasing sequence pi = exp(al), where 
a is a positive constant. Such a function space can be motivated by consid- 
ering the Fourier series in complex exponentials and considering the domain 
in which the function is analytical. For details, see Johnstone (1998). The 
norm of this function space can not be expressed in terms of integrals of 
squared derivatives of integer order. 

We now introduce a new function space that can be seen as the 
Sobolev space of infinite order, 

{00 00 
/:/(i)=EW(')-EM 2 <Q; 
1=0 1=0 

(8) 

« = l,fti-l=fti = e p " 2/2 , 

where u is a positive constant, and 0's are the classical trigonometric ba- 
sis over (— it, n). Simple calculation shows that an equivalent definition of 
H™(Q) is 

H™(Q) = J fe L 2 (-tt,it) :f is 2vr-periodic, 

m=0 ) 

From this we can see that can be seen as the Sobolev space of infinite or- 
der, and that the penalty functional Jo of the periodic Gaussian reproducing 
kernel as defined in (4) corresponds to the norm of H^{Q). 
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In this paper we focus on the method of regularization with the periodic 
Gaussian penalty (4). We will refer to this method as periodic Gaussian 
regularization. We study the statistical properties of this method both in 
the situation that / G H^ 1 and the situation / ^ H™. 

By converting the functions into the corresponding sequence of Fourier 
coefficients, we can see that the white noise problem (5) is equivalent to the 
following Gaussian sequence model: 

(9) yi = e l + e h 1 = 0,1,..., 

where the e^'s are independent iV(0, 1/n) noises and the #;'s are the Fourier 
coefficients of /. The periodic Gaussian regularization corresponds to 

oo oo 

(10) min$>i -0O 2 + A]>>^ 

1=0 1=0 

with Pi = e l2uj2 / 2 . 

In Section 2 we establish the asymptotic minimax risk (up to the constant) 
of nonparametric problems in the space H^(Q), and show that the periodic 
Gaussian regularization achieves this optimal asymptotic risk. In Section 3 
we study the asymptotic performance of the periodic Gaussian regularization 
in the situation where the underlying function to be estimated is in the 
Sobolev ellipsoid H m (Q) with unknown m and Q, or in the analytic function 
ellipsoid A a (Q) with unknown a and Q. We show that the method under 
study is asymptotically minimax in analytic function ellipsoids. For Sobolev 
ellipsoids H m {Q), the periodic Gaussian regularization achieves the optimal 
rate of convergence, and the efficiency in terms of the constant is reasonably 
high, tending to 1 as m goes to infinity. 

In Section 4 we consider choosing the smoothing parameters with the 
unbiased estimator of risk. The procedure is the well known Mallows' C p 
[Mallows (1973)], sometimes called Mallows' Cl in the literature. Li (1986, 
1987) established the asymptotic optimality of C p in many nonparamet- 
ric function estimation methods, including the method of regularization. 
Kneip (1994) obtained oracle inequalities for choosing smoothing parame- 
ters with C p in ordered linear smoothers. See also Cavalier, Golubev, Picard 
and Tsybakov (2002). These results can be used to study the periodic Gaus- 
sian regularization with smoothing parameters chosen by the unbiased risk 
estimator. We show that the resulting data-driven method retains the good 
theoretical properties of the periodic Gaussian regularization established in 
Sections 2 and 3. Thus, adaptive estimation is achieved for unknown order 
of smoothness by the periodic Gaussian regularization in the white noise 
model. 
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Due to the equivalence between the white noise model and other statistical 
models, we expect the periodic Gaussian regularization to have good statis- 
tical properties in other situations such as regression and generalized regres- 
sion. In fact, the equivalence results in Brown and Low (1996) show that 
the asymptotic results we obtained in Sections 2-4 for the white noise model 
apply to the periodic Gaussian regularization in the regression problem (2) 
with fixed equidistant design. In regression problems with nonequidistant de- 
sign, the periodic Gaussian regularization in regression does not match up 
exactly with the periodic Gaussian regularization in the white noise model, 
and therefore our results do not translate directly. However, we believe the 
results in the white noise model still give insights to the regression problem 
with general design. In this connection, see Brown and Zhao (2002). 

In Section 5 we consider the computation of the periodic Gaussian regular- 
ization in regression. The computation does not require equidistant design. 
Some simulations are given in Section 6 to study the finite sample properties 
of the periodic Gaussian regularization. In particular, the effect of the joint 
tuning of the smoothing parameters is studied, and the periodic Gaussian 
regularization is compared with the periodic cubic smoothing spline on four 
functions of different orders of smoothness. The simulation suggests that the 
finite sample performance of the periodic Gaussian regularization is compa- 
rable to that of the periodic cubic smoothing spline when the regression 
function is of moderate smoothness. In the case of a very smooth function, 
the periodic Gaussian regularization may have an advantage. Summary and 
discussion are given in Section 7. Technical proofs are relegated to Section 8. 

Throughout this paper the expression a n ~ b n means that a n /b n — > 1 as 
n — ► oo. 

2. Estimation in the Sobolev space of infinite order. In this section we 
consider the white noise problem in H^(Q). 

Theorem 1. The asymptotic minimax risk for nonparametric function 
estimation in the infinite- order Sobolev ellipsoid H£f(Q) is 2y / 2k;~ 1 7i~ 1 (logn) 1 / 2 . 
That is, 

oo 

inf sup V^(^-^) 2 ~2 v / 2u;~ 1 n- 1 (logn) 1/2 , 

where the infimum is over all possible estimators 6. 

Notice this asymptotic minimax risk does not depend on Q, but depends 
on to. 

In the following we consider the periodic Gaussian regularization. The 
following lemma will be used several times in later proofs. 
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Lemma 1. Consider the periodic Gaussian regularization (10) in the 
white noise model. Denote the estimator by 6. We have vaiO ~ 2\/2w _1 n _1 x 
(— log A) 1 / 2 , as n — > oo and A( n ) — > 0. 

Theorem 2. T/ie periodic Gaussian regularization (10) in i/ie white 
noise model is asymptotically minimax in the infinite-order Sobolev ellipsoid 
H™(Q), if the smoothing parameter A satisfies 

(11) log(l/A) ~ logn and A = o(n~ 1 (logn) 1 / 2 ). 

That is, 

oo 

inf sup V J E(^-6l i ) 2 ~2\/2a;" 1 n~ 1 (logn) 1/2 , 
A eeH™(Q) l=0 

and this asymptotic risk is achieved when (11) is satisfied. Here 6 is the 
method of regularization estimator from (10) with Pi = e l w I 2 . 

The condition (11) is satisfied if n\ n is bounded away from zero and infin- 
ity, but is milder. For example, it is satisfied by sequences A n = Cn~ 1 (logn) a 
for any constants C > and — oo<a<l/2. The adaptive choice of A is con- 
sidered in Section 4. 

3. Estimation over Sobolev spaces and spaces of analytic functions. In 

this section we consider the performance of the periodic Gaussian regular- 
ization when the function / to be estimated in the white noise problem 
belongs to a Sobolev body H m (Q) with unknown m and Q, or an analytic 
function ellipsoid A a {Q) with unknown a and Q. In these cases the func- 
tion to be estimated does not lie in the function space used in the method 
of regularization. 

Theorem 3. Assume f £ H m (Q) with m > 1 in the white noise model (5). 
Consider the periodic Gaussian regularization estimator 9 (10) with P21-1 = 
fcl = exp(l 2 uj 2 /2). We have 

inf sup ^E(^-^0 2 ~(2m + l)m- 2m /( 2m+1 )Q 1 /( 2m+1 )n- 2m /( 2m+1 ). 
A eeH m (Q) 1 

This asymptotic risk is achieved when log(l/A)/u 2 ~ (mnQ) 2 ^ 2m+1 ^ /2. 
Remark 1 . The conclusion of Theorem 3 holds for noninteger m > 1 . 
For the ellipsoid A a (Q) of analytic functions, we have the following: 
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Theorem 4. Assume f G A a (Q) in the white noise problem (5). Con- 
sider the periodic Gaussian regularization estimator 9 from (10) with P21-1 = 
[3 2l = e 12 " 2 / 2 . We have 

inf sup ^ J B(^-^) 2 ~2n~ 1 a~ 1 logn. 

A d£A a (Q) j 

TTiis asymptotic risk is achieved when log(l/A)/cj 2 = (logn) 2 /(2a 2 ). 

The proof of this theorem is similar to that of Theorem 3, with p\ = e al , 
and is skipped. It is known that the asymptotic minimax risk in A a (Q) is 
n; see Johnstone (1998). Therefore, Theorem 4 says that the 
periodic Gaussian regularization is asymptotically minimax in A a {Q). 

We can study the asymptotic efficiency of the periodic Gaussian regular- 
ization compared with the minimax estimator for nonparametric problems 
in H m {Q). We consider the maximum asymptotic risk over H m (Q). We 
compare the minimum of such asymptotic risk achieved by the periodic 
Gaussian regularization with the minimax risk over H m (Q). This indicates 
how close to the minimax value one can get with the periodic Gaussian reg- 
ularization. A similar study had been carried out by Carter, Eagleson and 
Silverman (1992), who studied the efficiency of the cubic smoothing spline 
in the second-order Sobolev space. 

It is well known that the asymptotic minimax risk over H m (Q) is 

[2m/(m + l)] 2m /( 2m+1 )(2m + l) 1 /( 2m + 1 )Q 1 /(2m+i) n -2m/(2m+i)_ 

This can be derived with an argument along the line of the proof of The- 
orem 1. Figure 1, left panel, gives the ratio between the asymptotic risk of 
the periodic Gaussian regularization and the minimax risk when the sam- 
ple size n is kept to be the same. The right panel gives the efficiency of 
the periodic Gaussian regularization. The efficiency is calculated in terms of 
sample sizes needed to achieve the same risk. We can see that the efficiency 
goes to one when the function is very smooth. The lowest efficiency occurs 
when m = 1, and the lowest efficiency is 33.3%. The efficiency when m = 2 
is 53.3%. 

4. Adaptive choice of the smoothing parameter. In the earlier sections 
we studied the performance of the periodic Gaussian regularization when 
the smoothing parameter A has an appropriate rate of decrease. This appro- 
priate rate depends on m (or a or to) and Q, which are generally unknown 
in practice. In this section we consider the problem of choosing the smooth- 
ing parameter with data. We study the common approach of choosing the 
smoothing parameter through the unbiased estimator of risk (Mallows' C p ). 
By making use of the oracle inequalities developed in Kneip (1994) [see also 
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Fig. 1. The efficiency of the periodic Gaussian regularization method. 



Cavalier, Golubev, Picard and Tsybakov (2002)], we show that the estima- 
tor chosen by the unbiased estimator of risk has the same asymptotic risk 
as the estimator with the optimal (theoretical) smoothing parameter. Thus, 
no asymptotic efficiency is lost due to not knowing m, Q and uj. 

The number u appears in the asymptotic risk of the periodic Gaussian 
regularization estimator in the function space H^(Q), but does not play an 
important role in the asymptotic risk in the function space H m (Q), so long 
as A is suitably chosen. From (22) in the proof of Theorem 3 we can see 
that the leading terms in the asymptotic risk in H m (Q) depend on u and A 
only through — log A/a; 2 . The asymptotic results suggest that tuning one of 
A and uj may suffice. For finite sample size, though, it may pay to tune uj as 
well as A. Usually there is a range of uj that works almost equally well if A 
is tuned correspondingly and vice versa. See the simulation in Section 6 for 
examples. Thus, we consider a rough tuning for uj, just to get to a reasonable 
range, and a fine tuning over A. 

Formally, we take a finite number of uj 7 s: ui, . . . ,ujs, and tune A and u> 
jointly over A and u> s £ {uj\, . . . ,ujs}- For asymptotic consideration, a range 
of [0, 1] for A suffices, since asymptotically A should go to zero. In practice 
we may use a slightly larger range. 

The tuning is based on the unbiased estimator of risk. Writing 

r i = (l + AA)" 1 , 

our estimator is 

0i = nyi. 

We can express the risk of our estimator as 

oo oo 

£ Efr - e t f = (i/n) £ r? + £(i - nfel 

l [=0 1=0 
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Now an unbiased estimator for Of is yf — (1/n). Plugging in, we get that 

oo oo 

(12) YXrf - 2t M - V n ) + (V^h 2 ] = £[(rf - 2T i)yf + PAM 

1=0 1=0 

is an unbiased estimator of J2iE(@i ~ ®i) 2 ~ We choose A* and lo* 

that minimize the unbiased risk (12), and use the corresponding periodic 
Gaussian regularization estimator 9*. Kneip (1994) studied the adaptive 
choice among ordered linear smoothers with the unbiased risk estimator. 
A family of ordered linear smoothers satisfies the condition that for any 
member Q[ = riyi, I = 0, 1, . . . , of the family, we have t\ G [0, 1] VZ; and for any 
two members of the family, Tiyi and r[yi, 1 = 0,1,..., we have either t\ > r[ 
VZ, or t[>ti VL It is easy to check that for any fixed u G {u±, . . . ,u>s}, the 
method of regularization estimators with varying A form a family of ordered 
linear smoothers. Applying the result in Kneip (1994) [recast in the Gaussian 
sequence model setting in Cavalier, Golubev, Picard and Tsybakov (2002)] 
to our situation gives the following: 

Lemma 2. Consider the Gaussian sequence model (9) and the periodic 
Gaussian regularization (10). Suppose X* and lj* minimize (12) over A G 
[0,1] and lo G {oj\, . . . ,u)s}> an d 0* is the corresponding periodic Gaussian 
regularization estimator. Then there exist positive constants C\ and Ci such 
that for any 9 £ I 2 and any positive constant B, we have 

(13) E01 - Otf < (1 + C^ 1 ) mm jE E0l ~ J + n~ l C 2 B. 
We then have the following: 

Theorem 5. For the periodic Gaussian regularization estimator 9* cho- 
sen by the unbiased estimator of risk, we have 

sup V J B(^-^) 2 ~2v / 2cj7 1 n- 1 (logn) 1/2 Vs G {1, . . . , S}, 

ee^S° s (Q) i 

SUp - Oxf ~ (2m + l) m -2m/(2m+l)gl/(2m+l) n -2m/(2m+l) ) 

8£H m {Q) l 

sup J2 E ^l ~ l) 2 ~2n- 1 a- 1 logn. 

8£A a (Q) l 

Therefore, the adaptive periodic Gaussian regularization estimator 9* is 
asymptotically minimax in H^(Q) and A a (Q), and achieves the optimal 
rate in H m (Q). The asymptotic efficiency is the same as that given in Sec- 
tion 3. Hence, the estimator adapts to any unknown order of smoothness. 
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5. Computation of periodic Gaussian regularization in regression. In or- 
der for the periodic Gaussian regularization in regression and generalized re- 
gression to be practically computable, we need the form of the reproducing 
kernel corresponding to the penalty functional Jo(f), that is, the reproduc- 
ing kernel of H™. Smola, Scholkopf and Miiller (1998) gave the following 
expression for the periodic Gaussian reproducing kernel: 

oo 

(14) R(s, t) = (1/tt) exp(-Z V/2) cos(/(s - t)). 

1=1 

Due to the fast decay of the sequence exp(— l 2 u 2 /2), it is possible to ap- 
proximate the series (14) with finitely many terms. However, an alternative 
formula of the kernel (14) is better suited for computation. We first state a 
lemma due to Williamson, Smola and Scholkopf (2001). 

Lemma 3. Let V(s — t) be a reproducing kernel with V : i? — > R being an 
even function. Let 

oo 
k=—oo 

Then 

s V2^ T ~ r , N ^2 f— ~/2fc7r\ 2kir(s-t) 

V u (s -t) = - — V(0) + Y -yfavl cos i L 

v V v J V 

where V is the Fourier transform ofV. 

Define G°°(r) = J2k=-ocG( r ~ 2k-Tr). It follows directly from Lemma 3 
that G°°(s — t) is the reproducing kernel (14) corresponding to the periodic 
Gaussian regularization. The function G°° can be approximated with the 
finite series G J = J2k=-J G(s — 2k?r) for some J. In fact, we have 

< G°° (s)-G 1 (s)< 2.1 x 10~ 20 Vse[-7r,7r] for u < 1. 

For cj > 1, we can choose a positive integer J such that 2 J + 1 > 3u. Then 
< G°°(s) - G J (s) < 10" 20 Vse [-7r,7r]. Therefore, G J (s) is an easily com- 
putable proxy of G°°(s). 

Now consider the periodic Gaussian regularization in the regression prob- 
lem (2) with the empirical loss being Y^=i(Vj ~ f{ x j)) 2 - Here we assume 
xj G (— 7r,7r), j = 1, ... ,n, and the regression function / is 2-7r-periodic. The 
theory of reproducing kernel Hilbert spaces guarantees that the solution to 
the method of regularization falls in a finite-dimensional space spanned by 
G co (xj, •). That is, we can write f(x) = Y^j=iCjG°°{xj — x), and the penal- 
ized regression (1) becomes 

(y - G°°c)'(y - G°°c) + Xc'G^c, 
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where, with little risk of confusion, we write y = (y±, . . . , y n Y, c = (ci, . . . , c ra )*, 
and G°° is the n x n matrix (G°°(xi — Xj)). The solution can then be found to 
be c = (G°° + XI)~ 1 y. In order to compute the solution as well as Mallows' C p 
for tuning the smoothing parameters, we use the eigenvalue-eigenvector de- 
composition G°° = VDV', where D is the diagonal matrix of eigenvalues, 
and V is the orthonormal matrix of eigenvectors. Let 

(15) T = D(D + XI)- 1 . 

Then f = SY, where S = VTV'. Mallows' C p in this context is \\y - f\\ 2 /n + 
(2/n) tr(S'). Notice the computation of the periodic Gaussian regularization 
in regression does not require equidistant design. 

It is possible to leave the constant term in the regression function unpenal- 
ized, as is commonly done in practice with smoothing splines and Gaussian 
regularization. This is equivalent to having /3q = in (10), and the asymp- 
totic results do not change. The penalized regression can be written as 

n 

min£(y, -(/(*,) + 6)) 2 + AJ (/). 

In this case the theory of reproducing kernel Hilbert spaces dictates that 
the solution can be expressed as / = G°°c + be, where e = (l,...,l)'. In the 
case of equidistant sample inputs, we can see that e is an eigenvalue of G°°, 
since G°° is periodic and even. In this case the computation is very similar 
to the case above with constants penalized: one simply replaces the diagonal 
element of T in (15) corresponding to the eigenvalue e by 1, and continues 
the computation with the new T. 

6. Simulations. We conduct some simulations to study the finite sample 
properties of the periodic Gaussian regularization in regression. Consider 
the regression problem (2) with the following four functions on [— 7r,7r]: 

fi{x) =sin 2 (x)l (a .> 0) , 

f 2 (x) = -x-tt + 2(x + 7r/2)i (a .>_ 7r/2 ) + 2(-x + vr/2)l (:r > 7r/2) , 
/ 3 (x) = l/(2-sin(x)), 

f±(x) = 2 + sin(x) + 2 cos(x) + 3sin 2 (x) + 4cos 3 (x) + 5sin 3 (x). 

The plots of the four functions are given in Figure 2. These are all 27r-periodic 
functions. The first function has only the second order of smoothness. The 
second function has only the first order of smoothness. The third function 
is infinitely smooth. The fourth function is even smoother: its Fourier series 
only contains finitely many terms. In all of our simulations the sample size 
is taken to be 100. All simulations are done in Matlab. 
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First we study the effect of the joint tuning of A and uj. We look at 
the regression problem (2) with the first regression function fi(x). In the 
first simulation we take the sample points to be equidistant in (— tv,tt). The 
scatter plot is shown in Figure 3, top left panel. We use the periodic Gaussian 
regularization to do the estimation for uj = , A = exp(— &2/5), k\ 

1 100, &2 = 1, . . . , 100. For each combination of uo and A we calculate the 

solution fx lUJ and the averaged squared error (l/n)J2j[fx,uj{ x j) — fi x j)] 2 - 
The bottom left panel of Figure 3 gives the corresponding contour plot 
of the averaged squared error. The x- and y-axes for the contour plot are 
k\ and fo, which are proportional to to 2 and —log A. Let the minimum 
of the averaged squared error be a. The levels in the contour plot are at 
1.01a, 1.05a, 1.1a, 1.2a, 1.5a, 2a, 3a, 4a, 5a, 6a. We used these levels to focus 
on the behavior of the averaged squared error around its minimum. It is 
clear that the contour levels are almost straight lines, indicating that the 





Fig. 2. The regression functions used in the simulations. The first function has only the 
second order of smoothness. The second function has only the first order of smoothness. 
The third function is infinitely smooth. The fourth function has a Fourier series that only 
contains finitely many terms. 
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averaged squared errors are almost the same when —log A varies linearly 
with to 2 . This agrees with what is suggested by the asymptotic results, and 
suggests that in regression problems, as long as uj is fixed in a reasonable 
range, we can concentrate on the tuning of the smoothing parameter A. 

Similar to any method of regularization, the periodic Gaussian regulariza- 
tion does not depend on the x's being equidistant. The same phenomenon in 
the joint tuning of A and u appears when the input x's are not equidistant. 
We run the same simulation with nonequidistant x's, and the corresponding 
scatter plot and the contour plot are given in the right panels of Figure 3. 
The nonequidistant x values are generated by taking the fractional part of a 
normal variate with mean 1/4 and standard deviation 1/4, and then scaling 
the [0,1] interval to [— 7r, 7r]. 

We run the same experiment with the other functions, /2, and f^, and 
the same observation about the joint tuning of A and u> is made in these 
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Fig. 3. The top panels are the scatter plots of the data generated from the regression 
model (2) with the regression function fi(x). Left: equidistant case. Right: nonequidistant 
case. The bottom panels are the corresponding contour plots of the averaged squared errors 
of the periodic Gaussian regularization. The x- and y-axes for the contour plots are pro- 
portional to uj 2 and — log A, respectively. We can see that in both cases the contour levels 
are very close to straight lines. 
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Table 1 

Averaged squared error over 100 runs, for the periodic cubic smoothing 
spline, the periodic Gaussian regularization, and the periodic Gaussian 
regularization with constant left unpenalized, on four different functions of 

varying 
order of smoothness 



Averaged squared error 



Regression Periodic cubic 
functions smoothing spline 



Periodic Gaussian regularization 



Constant penalized Constant unpenalized 



1 0.0711 

2 0.0541 

3 0.0457 

4 0.1136 



0.0675 
0.0578 
0.0462 
0.0899 



0.0682 
0.0582 
0.0448 
0.0899 



experiments. This supports our strategy of a rough tuning for uj and a fine 
tuning over A. 

Next we compare the periodic Gaussian regularization with the periodic 
cubic smoothing spline for regression on the circle on the four functions in 
Figure 2. The periodic cubic smoothing spline is the solution to 



This penalty corresponds to the second-order Sobolev space, but leaves 
the linear functions unpenalized. For an introduction to the periodic cu- 
bic smoothing spline, see Wahba (1990) or Gu (2002). 

We fix the x's to be equidistant in (—it, tt) in our comparison. We generate 
y's according to the regression model (2) with the four functions we con- 
sider. In both the periodic Gaussian regularization and the periodic cubic 
smoothing spline, the smoothing parameters are chosen according to Mal- 
lows' C p . We search the minimal point of Mallows' C p over u = 0.3/ci — 0.1, 
for k\ = 1, . . . , 10, and A = exp(— 0.4/C2 + 7), for = 1, • • • , 50, for the peri- 
odic Gaussian regularization; and we search over A = exp(— 0.4/C2 + 7), for 
£2 = 1,..., 50, for the smoothing spline. We use the chosen smoothing param- 
eter^) to compute the solutions. For each generated dataset, we calculate 
the averaged squared error of the periodic Gaussian regularization and the 
periodic cubic smoothing spline. 

We run the simulation 100 times. The averaged squared errors over the 
100 runs are summarized in Table 1. For each regression function, a two-sided 
paired t-test is performed to compare the periodic Gaussian regularization 
and the periodic cubic smoothing spline based on the 100 runs. For the first 
function, the p- value is 0.49; for the second function, the p- value is 0.06, and 
it seems the smoothing spline may perform better; for the third function, 
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the p-value is 0.9; for the fourth function, the p-value is very close to 0, 
and the periodic Gaussian regularization performed significantly better: we 
can see the averaged squared error of the periodic Gaussian regularization 
is 22% less than that of the periodic smoothing spline. 

7. Summary and discussion. In this paper we study the method of regu- 
larization with the periodic Gaussian kernel. Asymptotically, the method 
adapts to unknown order of smoothness and is efficient compared with 
the minimax risk when the underlying function is reasonably smooth. The 
smoothing parameters in the periodic Gaussian regularization can be chosen 
adaptively without loss of asymptotic efficiency. Limited experiments in the 
finite sample case suggest that the performance of the periodic Gaussian 
regularization is comparable to that of the periodic cubic smoothing spline 
when the underlying regression function is reasonably smooth, and the pe- 
riodic Gaussian regularization may have some advantage over the periodic 
cubic smoothing spline when the regression function is very smooth. This 
agrees with the asymptotic analysis, since it is well known that the cubic 
smoothing spline does not adapt to high order of smoothness. 

The Gaussian reproducing kernel is commonly used in practice and has 
been successful in empirical studies. Our study on the periodic Gaussian 
reproducing kernel gives a partial explanation of the success of Gaussian 
reproducing kernel in practice, as we expect the Gaussian reproducing kernel 
to have similar properties to its periodic counterpart. When we apply the 
nonperiodic version of the Gaussian kernel to the examples in our simulation, 
the results are slightly inferior to the periodic version. This is to be expected, 
as the nonperiodic version does not take advantage of the fact that the 
functions in the simulation are periodic. However, the difference is not large. 
The averaged squared errors are 0.0736, 0.0679, 0.0559 and 0.1198. 

The penalty functional Jo in periodic Gaussian regularization corresponds to the 
norm of the infinite order Sobolev space . It is also possible to con- 
sider the method of regularization with the penalty functional being the 
norm of the space A a of analytic functions. This penalty cannot be writ- 
ten in terms of integrals of squared derivatives of integer order, but can be 
written in terms of derivatives of fractional order. In the Gaussian sequence 
model setting, the method of regularization with the analytic function space 
penalty is equivalent to the method of regularization (10) with /3; = exp(aZ). 
Similar asymptotic results as derived for the periodic Gaussian regulariza- 
tion can be derived for this alternative regularization: the method adapts 
to Sobolev space H m with unknown smoothness m. It is also possible to 
give an explicit expression for the reproducing kernel. In fact, the repro- 
ducing kernel is (14) with exp(— lo 2 1 2 /2) replaced by exp(— oil). An equiv- 
alent form of this reproducing kernel is E°°(s — t), with E°°(r) defined as 
E°°(r) = J2T=-oo E ( r ~ 2k7r ) and E{r) = a/[it(r 2 + a 2 )} the Cauchy density 



REGULARIZATION WITH GAUSSIAN KERNEL 



17 



function. This form follows from Lemma 3. Unlike the periodic Gaussian 
kernel case, the decay of E(x) is slow, and it does not seem practical to use 
the form E°°(s — t) for computation. On the other hand, it might be possible 
to calculate the reproducing kernel with the series in (14) with exp(— al). 



8. Proofs. 



Proof of Theorem 1. The proof is an application of the theorem of 
Pinsker (1980). For completeness we state a form of the theorem given in 
Johnstone [(1998), Proposition 6.1 and Theorem 6.2]: 

Pinsker' s theorem. Consider the Gaussian sequence model (9) with 
the parameter space being the ellipsoid Q = {9 : a j@i < Q} with ai > and 
ai — >oo. Then the minimax risk R(Q,n) is asymptotically equivalent to the 
linear minimax risk Ri(Q,n), which satisfies 

(16) fl L (8,ri) = ±Wl-^) > 

where [i = fi(n, Q) is determined by 

(17) -Y]ai(n-ai) + = Q. 

In our case we have 021 = «2i-i — exp(£ 2 w 2 /4), and (17) becomes 

k 

2 exp(/ 2 w 2 /4){/i - exp(Z V/4)} = nQ, 
1=1 

with k = k(fi) = [2a;" 1 (log /i) 1 / 2 ], where [•] stands for the integer part. Notice 
that sums such as J2i=i exp(Z 2 cj 2 /4) are dominated by the single leading 
term. Some calculations then give that log/U~ (1/2) log (nQ). Therefore, 

k = k(n) ~ 2 1 / 2 w- 1 (log(nQ)) 1/2 . 
Hence, it follows from Pinsker's theorem that 
R(&,n) ~ R L (Q,n) 

2^f exp(l 2 u; 2 /4) 

~ n hS ^ 

~ ~fc(n) ~ 2 3 / 2 n- 1 a;- 1 (logn) 1 / 2 . 
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in uj 



This completes the proof of Theorem 1. 
□ 

Proof of Lemma 1. Solving the minimization problem (10), we get 
the method of regularization estimator d[ = (1 + Xfli)~ 1 yi. As A goes to zero, 
we have 

^var0 = (l/n)^(l + A/3 / )- 2 
i i 

oo 

~(2/n)£(l + Ae^ 2 / 2 )- 2 
l=i 

~(2/n) / (l + \e x2uj2 / 2 Y 2 dx 

(1 + <*)-*(!,- log A)-V2 dl , 

f (l + e^-^y-logA)- 1 / 2 ^ 

J log A 

/-00 

+ / (l + eT^y-logA)- 1 / 2 ^ • 
Jo 

For the second term in the bracket, we have 

/■oo rco 

0< / {l + e v y 2 {y-\ogXy 1,2 dy< (-logA)" 1 / 2 / (l + e y )- 2 dy. 
Jo Jo 

Now let us look at the first term in the bracket. We have, on one hand, 

f° (1 + e y)- 2 (y ~ log Xy^dyK [ (y-logXr 1/2 dy = 2(-logX) 1 / 2 ; 
J log A J log A 

on the other hand, 



v^r- 1 '-- 1 



in uj 



log A 
o 



log A 



(1 + e v)- 2 (y- log X)- l ' 2 dy 



> 



log(-logA) 
log A 



>(! + (- log A)- 1 ) 

~2(-logA) 1/2 . 
Therefore, we have 

rl) 



(1 + e v)-2 ( y _ log Xy^dy 

-log(-logA) 



U-2 



log A 



(y- log Xy^dy 



[ (1 + e^)- 2 (y - log Xy 1 ' 2 dy ~ 2(- log A) 1 / 2 , 

J log A 
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and the conclusion of the lemma follows. □ 

Proof of Theorem 2. The periodic Gaussian regularization estimator 
is §i = (1 + AA)"V We have, for any 9 £ H™(Q), 

oo oo 

£(M-fl0 3 = £A 2 A 2 (i + AA)- 2 0? 

1=0 1=0 

oo oo 

< 1/4A]T A# 2 = 1/4A5>^ 2 < 1/4AQ. 

1=0 1=0 

Hence, from Lemma 1 we have, for any 9 6 H^(Q), 

oo oo 

- °i) 2 = £(M - Q if + E var ^ ^uj- l n-\- log A) 1 / 2 + QA/4. 

I 1=0 1=0 

The last quantity is asymptotically equivalent to the asymptotic minimax 
risk 2\/2uj~ 1 n~ l (— log A) 1//2 under (11). Therefore, under (11), the periodic 
Gaussian regularization estimator is asymptotically minimax. □ 

Proof of Theorem 3. The estimator is §i = (1 + A/?;) -1 !//. Prom 
Lemma 1, we have 

£ var<? = (1/n) £(1 + AA)~ 2 ~ 2^2u~ l n^{- log A) 1/2 . 



On the other hand, we have 

sup £(M-0j) 2 
8£H™(Q) j 



= sup ^(l + A- 1 ^ 1 )" 2 ^ 2 

oo 

= sup £(i + a-\o~V(m 2 )- 

Here P2/-1 = /»2Z = 1 + ^ 2m are the coefficients in the definition (6) of the 
Sobolev ellipsoid H m {Q). Clearly, the maximum is achieved by putting all 
mass Q at term I that maximizes (l + A -1 /3j )~ 2 /9jT . That is, the maximum 
is 



(18) Q 



max(l + A~ 1 / 9f 1 )~ 2 p / ~ 1 



To evaluate (18), we first find the minimizer of 

B x (x) = [1 + A _1 exp(-rcV/2)] 2 (l + x 2m ) over x > 0. 
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Let xq(X) be a global minimizer of B\{x). It is easy to see that Xq(X) ^ oo, 
since B\(oo) = oo. Now let us first show that xo(X) — ► oo as A — > 0. We prove 
this with the elementary definition of limits. For any M > 0, we can find 
x > M such that expp 2 - M 2 )uj 2 } > l + x 2m . Then lim A ^ £>(A) > 1, where 

D(X) = [A + exp(-M V/2)] 2 [A + exp(-z V/2)]- 2 (l + x 2m y\ 

Therefore, there exists 5 > 0, such that D(X) > I for any A < 5. On the 
other hand, for any x < M, we have B\(x)/B\(x) > D{X). Therefore, for 
any A < 5, we have inf x <m B\ (x) > B\{x), therefore, xq(X) > M. This shows 
that xq(X) — > oo as A — ► 0. 

Since xo(X) / oo, we have B' x (xq) = 0. That is, 

(19) m - V(xjj + x~ (2m ~ 2) ) = 1 + Xexp{x 2 uj 2 /2). 
Since xo(A) — > oo as A — ► 0, we have 

(20) m _1 o; 2 3;g~ Aexp(x^ 2 /2), 

(21) sgo; 2 /2~ (-log A). 
Therefore, by (19) and (20) we have 

(2m-2)x -.n-1-,2/-, 2m\ „,2m 



B x (x ) = [1 + (m~ V(x 2 + *- (2m -^) - l)- 1 ] 2 ^ + x% 
From this and (21), we see that 



o 



Q 



max(l + A -1 ^" 1 ) -2 ^" 1 



/ 

Therefore, 

max YE(6i-6i) 2 
(22) 1 



Qxo 2m ~Q2- m ^ m (-logA) 



~ Q2~ m u; 2m (- log A)" m + 2v / 2^- 1 n~ 1 (- log A) 1 / 2 . 
The conclusion of the theorem then comes from simple calculations. □ 

Proof of Theorem 5. By (13), we have 

sup Y, E W-°i) 2 

6&H m (Q) ; 

< (1 + 0(B~ 1 )) sup mini - 6i) 2 1 +n" 1 0(S) 

8eH™{Q) x ^s [ l J 

<(l + 0(B- 1 ))min sup J V£(fy - fy) 2 [ +n" 1 0( J B). 

A,w s eeH m {Q) { l J 

Similar inequalities hold for H£f(Q) and A a (Q). Now take B = (logn) 1 / 3 , 
and the conclusion of the theorem follows from Theorems 1-4 and the fact 
that uj s £ {uj\, . . . ,u>s} has finitely many possibilities. □ 
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